Getting started GenAI & LLM with my Udemy course, Hands-on Generative AI Engineering with Large Language Model 👇
Introduction
Direct Preference Optimization (DPO) is a technique used to align AI-generated outputs with human preferences by optimizing language models. To achieve this, a preference dataset is required, containing data that enables models to understand which responses are preferred by humans and which are not. In this article, we’ll walk through a code implementation to create such a dataset using Python, OpenAI’s API, and Hugging Face’s Datasets library.
Components of a Preference Dataset for DPO
A preference dataset typically includes:
Prompts: Inputs or questions given to the AI model. Chosen Responses: AI-generated responses preferred by human evaluators. Rejected Responses: Less preferred responses or responses not selected by human evaluators. By providing this structure, the dataset allows a model to learn which responses are preferable, making it better aligned with human preferences.
Our use-case
In our previous post, we created an instruction dataset, TinyStories_Instruction, from the raw TinyStories dataset. This dataset was specifically designed for fine-tuning a pretrained Large/Small Language Model using LORA/QLORA to develop a story generator tailored to 5-year-olds.
In this guide, we take the next step by creating a preference dataset from the previously generated instruction dataset. This dataset is used for fine-tuning a pretrained Large/Small Language Model through Direct Preference Optimization (DPO), enhancing our story generator to align even better with human preferences and produce engaging, age-appropriate content for young children.
The process for creating a preference dataset is illustrated below:
Implementation
This implementation involves a series of steps: extracting data, generating AI responses, and creating preference triplets.
import concurrent.futures
import json
from concurrent.futures import ThreadPoolExecutor
from typing import List, Tuple
from datasets import Dataset, load_dataset, concatenate_datasets
from openai import OpenAI
from tqdm.auto import tqdm
from google.colab import userdata1. Data Extraction Function
The extract_ground_instruction_story function extracts pairs of instructions and desired outputs from a given dataset.
def extract_ground_instruction_story(dataset):
return [(example['instruction'], example['output']) for example in dataset]2. Creating a PreferenceSet Class
The PreferenceSet class manages and stores the triples of (instruction, generated story, desired story).
class PreferenceSet:
def __init__(self, triples: List[Tuple[str, str, str]]):
self.triples = triples
@classmethod
def from_json(cls, json_str: str, instruction, desired_story) -> 'PreferenceSet':
data = json.loads(json_str)
triples = [(instruction, data['generated_story'], desired_story)]
return cls(triples)
def __iter__(self):
return iter(self.triples)3. Generating Preference-Response Triplets
This function generates a story using OpenAI’s API and returns a preference triple in the format (instruction, generated response, desired response).
def generate_preference_answer_triples(instruction: str, desired_story: str, client: OpenAI) -> List[Tuple[str, str, str]]:
prompt = f"""Based on the following instruction, generate a story. \
Story should be no longer than 50 words. Story uses several complex words or structures \
that are not suitable for 5-year-olds.
Provide your response in JSON format with the following structure:
{{"generated_story": "..."}}
Instruction:
{instruction}
"""
completion = client.chat.completions.create(model="gpt-4o-mini",
messages=[
{"role": "system",
"content": "You are a helpful assistant who \
generates story based on the given instruction. \
Provide your response in JSON format.",},
{"role": "user", "content": prompt},
],
response_format={"type": "json_object"},
max_tokens=512,
temperature=0.2,)
result = PreferenceSet.from_json(completion.choices[0].message.content, instruction, desired_story)
# Convert to list of tuples
return result.triples4. Creating the Preference Dataset
This function creates a dataset using the extracted stories and generated responses.
def create_preference_dataset(dataset: Dataset, client: OpenAI, num_workers: int = 4) -> Dataset:
stories = extract_ground_instruction_story(dataset)
instruction_answer_triples = []
with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = [executor.submit(generate_instruction_answer_triples, instruction, desired_story, client) for instruction, desired_story in stories]
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
instruction_answer_triples.extend(future.result())
instructions, rejected_story, chosen_story = zip(*instruction_answer_triples)
return Dataset.from_dict({
"prompt": list(instructions),
"rejected": list(rejected_story),
"chosen": list(chosen_story)
})5. The Main Function
This function initializes the OpenAI client, loads the dataset, creates a preference dataset, and uploads it to the Hugging Face Hub.
def main() -> Dataset:
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
# 1. Load the raw data
# Load the train and test splits
train_dataset = load_dataset("tanquangduong/TinyStories_Instruction", split="train")
test_dataset = load_dataset("tanquangduong/TinyStories_Instruction", split="test")
# Combine the datasets
raw_dataset = concatenate_datasets([train_dataset, test_dataset])
print("Raw dataset:")
print(raw_dataset.to_pandas())
# 2. Create preference dataset
preference_dataset = create_preference_dataset(raw_dataset, client)
print("Preference dataset:")
print(preference_dataset.to_pandas())
# 3. Train/test split and export
filtered_dataset = preference_dataset.train_test_split(test_size=0.1)
filtered_dataset.push_to_hub("tanquangduong/TinyStories_Preference")6. Hugging Face Hub Login
To authenticate with Hugging Face and run the pipeline:
from huggingface_hub import login
# Log in to the Hugging Face Hub
login(token=userdata.get('HF_TOKEN'))
# Launch the pipeline to create instruction dataset
main()Conclusion
The above code demonstrates how to create a preference dataset for Direct Preference Optimization. By training a language model using a preference dataset, we can better align the model’s outputs with human expectations, thereby enhancing the relevance and quality of AI-generated responses. This approach enables more user-centered AI development and refinement.